FORM PTO-1390 
(REV 11-98) 



U.S DEPARTMENT OF COMMERCE PATENT AND TRADEMARK OFFICE 



ATTORNEY'S DOCKET NUMBER 

36-1395 



TRANSMITTAL LETTER TO THE UNITED STATES 
DESIGNATED/ELECTED OFFICE (DO/EO/US) 
CONCERNING A FILING UNDER 35 U.S.C. 371 



U.S APPLICATION NO (If known, see 37 C,F R 15) 



INTERNATIONAL APPLICATION NO. 
PCT/GB99/02449 



INTERNATIONAL FiLiNG DATE 
27 July 1999 



PRIORITY DATE CLAIMED 
5 August 1998 



TITLE OF INVENTION 

DATA PROCESSING APPARATUS AND METHOD FOR OPTIMISING CONFIGURATION PARAMETERS OF A PHYSICAL SYSTEM 



'>^^PPLICANT(S) FOR DO/EO/US 



DATES 



Applicant herewith submits to the United States Designated/Elected Office (DO/EO/US) the following itegn^nd other info 




1. 




2. 


□ 


3. 




4. 





ilMtion: 

1. 

This Is an express request to begin national examination procedures (35 U.S.C. 371(f) at any time ratHSsAHrSn delay 
examination until the expiration of the applicable time limit set in 35 U.S.C. 371(b) and PCT Articles 22 and 39(1). 

A proper Demand for International Preliminary Examination was made by the 19*^ month 
J from the earliest claimed priority date. 

"fS. A copy of the International Application as filed (35 U.S.C. 371 (c)(2)). 

S is transmitted herewith (required only if not transmitted by the International Bureau). 
S has been transmitted by the International Bureau. 

□ is not required, as the application was filed in the United States Receiving Office (RO/US). 
;6. □ A translation of the International Application into English (35 U.S.C. 371 (c)(2)). 

□ Amendments to the claims of the Internationa! Application under PCT Article 1 9 (35 U.S.C. 371 (c)(3)). 

□ are transmitted herewith (required only if not transmitted by the International Bureau). 

□ have been transmitted by the International Bureau. 

□ have not been made; however, the time limit for making such amendments has NOT expired. 
D have not been made and will not be made. 





a. 




b. 




c. 




□ 


7. 


□ 




a. 




b. 




c. 




d. 


8. 


□ 


9. 




10. 


□ 


Items 1 1 


11. 


□ 


12. 




13. 






□ 


14. 


□ 


15. 


□ 


16. 






□ 



(35 U.S.C. 371(c)(5)). 



37 C.F.R. 3.28 and 3.31 is included. 
A FIRST preliminary amendment. 



Copies of Amended Pages 1 and 59, and pages 60 through 72 (claims 1 through 14) 



JC07 Rec'd PCT/PTO \ ] jan 200t 



U.S. APPLICATI 



INTERNATIONAL APPLICATION NO 

PCT/GB99/02449 



ATTORNEY'S DOCKET NUMBER 
36-1395 



1 7. M The foliowing fees are submitted: 



BASIC NATIONAL FEE (37 C.F.R. 1 .492{a)(1 H5): 

Neither international preliminary examination fee (37 C.F.R. 1 .482) 
nor international search fee (37 C.F.R. 1 .445(a)(2)) paid to USPTO 

and International Search Report not prepared by the EPO or JPO $1 000.00 

" International preliminary examination fee (37 C.F.R. 1 .482) not paid to 

USPTO but international Search Report prepared by the EPO or JPO $860.00 

International preliminary examination fee (37 C.F.R. 1 .482) not paid to USPTO 

but international search fee (37 C.F.R. 1 .445(a)(2) paid to USPTO $710.00 

c- International preliminary examination fee paid to USPTO (37 C.F.R. 1 .482) 

) but all claims did not satisfy provisions of PCT Article 33(1 )-(4) $690.00 

- international preliminary examination fee paid to USPTO (37 C.F.R. 1 .482) 

and ail claims satisfied provisions of PCT Article 33(1 )-(4) $100.00 

ENTER APPROPRIATE BASIC FEE AMOUNT = 



Surcharge of $130.00 for furnishing the oath or declaration later than □ 20 
months from the earliest claimed priority date (37 C.F.R. 1 .492(e)). 



□ 30 



CALCULATIONS PTO USE ONLY 



$ 



860.00 



0.00 



CLAIMS 



NUMBER FILED 



NUMBER EXTRA 



RATE 



Total Claims 



41 



-20 = 



21 



$18.00 



378.00 



Independent Claims 



-3 = 



$80.00 



MULTIPLE DEPENDENT CLAIMS(S) (if applicable) 



$270.00 



0.00 



0.00 
1238.00 



TOTAL OF ABOVE CALCULATIONS = 



Reduction by V2 for filing by small entity, if applicable. Small entity status must also be asserted. 
(Note 37 C.F.R. 1.9. 1.27/1.28). 



0.00 



SUBTOTAL = 



1238.00 



Processing fee of $130.00, for furnishing the English Translation later than □ 20 □ 30 
months from the earliest claimed priority date (37 C.F.R. 1 ■492(f)). + 



0.00 



TOTAL NATIONAL FEE = 



1238.00 



Fee for recording the enclosed assignment (37 C.F.R. 1 .21(h)). The assignment must be 
acco mpanied by an appropriate cover sheet (37 C.F.R. 3.28. 3.31). $40.00 per property 



40.00 



Fee for Petition to Revive Uninte ntionally Abandoned Application ($1240.00 - S mall Entity = $620.00) 
— TOTAL FEES ENCLOSED = 



$ 



0.00 



$ 



1278.00 



Amount to be: 
refunded 



Charged 



$ 



a K A check in the amount of $1278.00 to cover the above fees is enclosed. 

b. □ Please charge my Deposit Account No. 14-1 140 in the amount of $ to cover the above fees. A duplicate copy of this 

form is enclosed. . * + 

c. ISI The Commissioner is hereby authorized to charge any additional fees which may be required, or credit any overpayment to 

Deposit Account No. 14-1140 . A duplicate copy of this form is enclosed. 

d. □ The entire content of the foreign appiication(s), referred to in this application is/are hereby incorporated by reference m this 

application. ^ , /o-. c n 

NOTE: Where an appropriate time limit under 37 C.F.R, 1 .494 or 1 .495 has not been met, a petition to revive (37 C.F.R. 
1.137(a) or (b)) must be filed and granted to restore the application to pending status. 



SEND ALL CORRESPONDENCE TO: 

NIXON & VANDERHYE P.C. 
1 100 North Glebe Road, 8*^ Floor 
Arlington, Virginia 22201 
Telephone: (703) 816-4000 




SIGNAfURE 



Larry S. Nixon 



9. IV<(9^ 



NAME 



25,640 



January 11, 2001 



REGISTRATION NUMBER Date 



-2- 



09/743520 

JCOTRec'dPCT/PTO 1 1 M 200! 
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re Patent Application of 

OATES Atty. Ref.: 36-1395 

Serial No. Unknown Group: 

National Phase of PCT/GB99/02449 
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For DATA PROCESSING APPARATUS AND METHOD FOR 
OPTIMISING CONFIGURATION PARAMETERS OF A 
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*********** 

January 11, 2001 

Assistant Commissioner for Patents 
Washington, DC 20231 

Sir: 

PRELIMINARY AMENDMENT 

Prior to calculation of the filing fee and in order to place the above identified 
application in better condition for examination, please amend the claims as follows: 
IN THE CLAIMS (AS ON AMENDED SHEETS) 

Claim 3, lines 1 and 2, delete "or claim 2". 

Claim 4, lines 1 and 2, delete "any preceding claim" and insert -- claim 1 -. 
Claim 7, lines 1 and 2, delete "any one of claims 1 to 3" and insert - claim 1 --. 
Claim 10, lines 1 and 2, delete "any one of claims 1 to 9" and insert -- claim 1 --. 
Claim 12, lines 1 and 2, delete "any one of claims 1 to 9" and insert -- claim 1 --. 
Claim 14, lines 1 and 2, delete "or claim 13". 
Claim 16, lines 1 and 2, delete "any one of claims 12 to 15" and insert 
-- claim 12 -. 

Claim 17, lines 1 and 2, delete "any one of claims 1 to 9" and insert - claim 1 --. 
Claim 20, lines 1 and 2, delete "any one of claims 17 to 19" and insert 
-- claim 17 -. 



OATES 

Serial No. Unknown 



Claim 23, line 1, 


delete 


"or claim 22". 




Claim 24, line 1, 


delete 


"any one of claims 21 to 23" and insert 


- claim 21 - 


Claim 27, line 1, 


delete 


"any one of claims 21 to 23" and insert 


claim 21 - 


Claim 30, line 1, 


delete 


"any one of claims 21 to 29" and insert 


claim 21 - 


Claim 32, line 1, 


delete 


"any one of claims 21 to 29" and insert 


claim 21 -- 


Claim 34, line 1, 


delete 


"either of claims 32 or 33" and insert - 


claim 32 


Claim 36, line 1, 


delete 


"any one of claims 32 to 35" and insert 


- claim 32 


Claim 37, line 1 , 


delete 


"any one of claims 21 to 29" and insert 


- claim 21 


Claim 39, line 1, 


delete 


"either claims 37 or 38" and insert - claim 37 --. 


Claim 40, line 3, 


delete 


"any one of claims 21 to 39" and insert 


" claim 21 - 


Claim 41, lines 2 and 3, 


delete "any one of claims 21 to 39" and insert 



-- claim 21 --. 



The above amendments are made to place the claims in a more traditional 
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5 The present invention generally relates to the data 

processing method and apparatus for determining optiinum parameters 
of the model of a physical system and for controlling the 
configuration of the physical system* 

The generation of a model of a physical systera is used for a 
10 wide variety of applications Including learning more about the 
behaviour of the system and controlling the parameters of the 
system. The model of the physical system can be used to try out 
parameters in order to determine optimum parameters which can chen 
be used for controlling the physical system. In this way the 
15 effect of the parameters on che physical system can be predicted 
so that only optimum parameters are chosen for use in the concrol 
of the physical system. 

The theory and motivation behind optimisation of systems has 
been considered by many workers, notably David E Goldberg. In 
20 ^'Genetic Algorithms in search, optimisation and machine learning" , 
Goldberg presents the goals of optimisation as an "improvemenc of 
system performance towards some optimal point or points" . Goldberg 
describes several approaches that can be applied to achieve 
improvements in performance and presents methods for measuring 
25 system performance. The approaches presented include 
representation of a problem - coding of parameter sets defining 
the problem - and the application of genetic operators such as 
mutation and cross over to the coded parameter sets. 

One such physical system for which configuration parameters 
30 are required is a distributed database. Because the use of 
corporate intranets continues to rise, the management of efficient 
access to data is becoming a key issue. Large inf ozmaation syscems, 
often accessed on a global basis, are increasingly being provided 
by discribuzred means wich use of "mirror sites" to ease congestion 
35 and improve local access. The configuration of the information 
system defining the location of the 
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information and the servers to which clients are to 
direct queries is important in order to provide a system 
which has a high performance as perceived by the clients* 
Further^ where access is truly global, static design 
solutions are not ideal;- as the largest source of load 
on the system moves from geographic area to geographic 
area at different times of the day. This tends to create 
congestion that is "localised" in the region of demand. 
Localised mirroring in one region only will simply shift 
the contention onto the global coimnuni cations networks 
when demand shifts to other regions. Mirroring in all 
regions is not only costly, but also leads to over 
duplication of data, and greater problems with integrity 
and the need to perfoarm multiple updates of the data. 

Where the data is required to be both updated as 
well as read, the administration of these information 
systems can become significant and labour intensive. In 
a recent paper by MJ Gates et al entitled "Investigating 
Evolutionary Approaches for Self -Adaption in Large 
Distributed Databases" (Proceedings of the 1998 IEEE 
International Conference on Evolutionary Computation) and 
in a paper by G Bilchev et al entitled "Comparing 
Evolutionary Algorithms and Greedy Heuristics for 
Adaption Problems" (Proceedings of the 1998 IEEE 
International Conference on Evolutionary Computation) it 
has been shown that autonomous management of distributed 
information systems is feasible and in these publications 



the use of various types of algorithms for the adaption 
of distributed databases has been considered. 

The present invention provides an improved method 
of apparatus for determining optimum parameters of the 
model of a physical system wherein at least one initial 
string of values representing the parameters of the model 
to be optimised is obtained; a cost value associated with 
the model having parameters represented by the reached 
string of values is determined; a new string of values 
is repeatedly generated by a) selecting a sequence of 
values of random length starting at a random position in 
a string of values^ b) replacing the sequence of values 
of the same length in a string of values at a random 
position and c) changing the value of one more of the 
values of the resulting string of values to generate a 
new string of values ; a cost value associated with the 
model having parameters represented by the new sting of 
values is determined; and the optimum parameters are 
determined as one of the initial or new string of values 
for which the cost value associated with the model having 
the optimum parameters is closest to a target, such as 
a maximum or a minimum. 

Using this technique new strings of values 
representing new parameters of a model are produced which 
can be applied to the model to see if a cost value is 
closer to a target e.g. a minimum or a maximum is 



produced. The technique thus provides a novel search 
technique to search for the optimum parameters . 

This technique is particularly applicable for the 
control of the configuration of a physical system wherein 
the model models the physical system and the new 
parameters generated represent new untried configurations 
of the physical system. The cost value output from the 
model can be used to deteinnine whether the new 
configurations are an improvement or not* Once the 
optimum parameters have been determined they can be used 
to control the configuration of the physical system. 

The present invention is particularly suited to the 
problem of configuring a distributed database comprising 
a plurality of data servers connected over a network, 
each data server holding any number of data units , and 
a plurality of clients connected over the network to the 
data servers, each of said clients being adapted to 
retrieve data units from and/or update data units at one 
or more of the data servers . V7hen the invention is 
applied to the configuration of a distributed database, 
the configuration parameters define which data server is 
to be accessed by which client and the distributed 
database is modelled using the configuration parameters 
and information on the passage of data within the 
distributed database to determine a performance value for 
each of the strings of values . Which ever of the strings 
of values have the best performance value is then used 



for configuring the distributed database to control which 
data server is to be accessed by which client • The 
performance value is representative of the performance 
perceived by one or more of the clients. In this 

aspect of the present invention a client comprises any 
application running on a computer connected to the 
network which requires access to the data units stored 
on the data servers . Where the performance value 
represents the transaction time for one or more clients 
accessing a data unit on a data server, the best 
perfojonance value is a minimum and thus the technique of 
this embodiment of the present invention will attempt to 
minimise the performance value i.e. the transaction time. 

Although the present invention is particularly 
applicable to optimising the configuration of a 
distributed database, it can be applied to any physical 
system or model wherein parameters of the model can be 
provided as a string of values . The values can either 
comprise actual numerical values of the model or can 
comprise numerical values representing features of the 
model i.e. they comprise or represent symbols. 

The present invention can be applied generally to 
the configuration of a distributed processing system, of 
which a distributed data base is but one example, wherein 
the configuration parameters determine which nodes 
(servers) in the network are to be used for processing 
requested by applications (clients). 



In a preferred embodiment of the present invention 
the string of values is considered to wrap round to form 
a continuous string such that the last and first values 
of the string are sequential. In this way,, when a 
sequence of values at random length is selected in a 
string of values the sequence can include the last and 
first values sequentially. Similarly, when a sequence 
of values is replaced in a string of values this can 
include the sequential last and first values. 

By selecting a sequence of values and overlaying it 
the intention is that good segments of the string of 
values in one part of the sting may also work well as a 
building block elsewhere in the string. This technique 
allows the values which may have become "extinct" in 
positions in the population of strings to be "reinjected" 
thereby enhancing the searching technique. The technique 
for generating the initial parent strings of values does 
not appear to be important. In one embodiment these are 
generated randomly. 

The present invention is applicable to evolutionary 
computational search techniques and other conventional 
search techniques. For example, the technique of the 
present invention can be applied to a genetic algorithm 
wherein a population of strings of values is created and 
new strings of values are created from two parents . The 
technique of the present invention when applied to 
genetic algorithms is a cross-over technique wherein a 



sequence of values of random length is selected at a 
random start position in a first string of values and 
this is used to replace a sequence of values at the same 
length in a second string of values at a random position. 
One or more of the values of the resulting string of 
values is then mutated to generate the new string of 
values. This basic technique can be used in any sort of 
genetic algorithm such as a Breeder genetic algorithm or 
a Tournament genetic algorithm. 

As mentioned above, the present invention is 
applicable to non-evolutionary search techniques in which 
a single string of values is used as a parent from which 
a new string of values is generated- In this method the 
sequence of values of random length is selected starting 
at a random position in the parent string of values and 
this is used to replace a sequence of values in the same 
length in the parent string of values at a random 
position. One or more of the values of the resulting 
string of values is then changed to generate the new 
string of values. This basic technique can be used in 
a Hill Climbing search technique or in a Simulated 
Annealing search technique for example. 

Since the strings of values comprise a number of 
values, each of which is . a value for a particular 
parameter, the strings of values can be termed solution 
vectors since they can be represented as n dimensional 
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vectors and the "best" one will comprise the solution to 
the problem of finding the best configuration parameters. 

Embodiments to the present invention will now be 
described with reference to the accompanying drawings, 
in which: 

Figure 1 is a schematic illustration of the use of 
the technique of the present invention to determine 
optimum parameters of a model in the physical system. 

Figure 2 is a schematic diagram of use of the 
technique of Figure 1 to control the configuration of a 
physical system, 

Figure 3 is a schematic diagram illustrating the use 
of the technique to control the configuration of a 
distributed database in accordance with one embodiment 
of the present invention^ 

Figure 4 is a schematic diagram of a distributed 
database in accordance with one embodiment of the present 
invention. 

Figure 5 is a schematic diagram of a distributed 
database in accordance with another embodiment of the 
present invention , 

Figure 6 is a schematic diagram of a distributed 
database in accordance with a further embodiment of the 
present invention , 

Figure 7a is a flow diagram illustrating a retrieval 
operation , 
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Figure 7b is a flow diagram illustrating an update 
operation. 

Figure 8 is a flow diagram illustrating steps in 
adapting the distributed database, 
5 Figure 9 is a schematic diagram of a distributed 

database in accordance with a first scenario. 

Figure 10 is a schematic illustration of a solution 
vector. 

Figure 11 is a flow diagram of the steps in a first 
10 "basic" method for using a model to determine a cost 
value. 

Figure 12 is a flow diagram of a Breeder genetic 
algorithm for determining the lowest cost value. 

Figure 13 is a flow diagram of a single three way 
15 Tournament genetic algorithm for determining the lowest 
cost value. 

Figure 14 is a flow diagram of the steps for 

generating a new solution vector in accordance with an 

embodiment of the present invention, 
20 Figure 15 is a schematic illustration of the process 

for generating a new solution vector in accordance with 

the flow diagram of Figure 14, 

Figure 16 is a schematic illustration of a 

distributed database of a second scenario (B), 
25 Figure 17 is a flow diagram illustrating the steps 

of a second method termed "just used" for using the model 

to determine a cost value. 



Figure 18 is a flow diagram illustrating the steps 
of a third method termed "plus average" for using the 
model to determine a cost value. 

Figure 19 is a flow diagram illustrating the steps 
of a fourth method termed "plus used" for using the model 
to determine a cost value, 

Figure 20 is a flow diagram illustrating the steps 
of a fifth method termed "just used" for using the model 
to determine a cost value. 

Figure 21 is a flow diagram illustrating the steps 
of a sixth method termed "plus all" for using the model 
to determine a cost value. 

Figure 22 is a flow diagram illustrating the steps 
of a seventh method termed "plus 10%" for using the model 
to determine a cost value. 

Figure 23 is a flow diagram illustrating a Hill 
Climbing algorithm, 

Figure 24 is a flow diagram illustrating a Simulated 
Annealing algorithm , 

Figure 25 is a flow diagram illustrating the steps 
of one embodiment of the present invention for generating 
a new solution vector. 

Figure 26 is an illustration of the technique of 
Figure 25 for generating a new solution vector. 

Figure 27 is an illustration of the results for five 
different optimisation techniques for scenario A using 
the basic, least worst server model. 



Figure 28 is an illustration of the results for the 
five different optimisation techniques for scenario A 
using the "plus all" model. 

Figure 29 is an illustration of the results for the 
five different techniques for scenario B using the basic, 
least worst model, and 

Figure 30 is an illustration of the results for the 
five different techniques for scenario B using the "plus 
all" model. 

Figure 1 is a schematic illustration of the 
principles of one aspect of the present invention. A 
modeller 50 operates to generate a model of a physical 
system. In order to generate the model the modeller 50 
receives configuration parameters from an optimiser 60. 
Thus, the optimiser 60 controls the configuration of the 
model generated by the modeller 50. As a result of the 
generated model, the modeller 50 outputs a cost value to 
the optimiser 60. The cost value can represent any 
function of the model which it is desired to, for 
example, reduce or increase. For example, in the design 
of a system it may be desirable for the cost value to 
represent the energy consumption of the system which is 
to be reduced or the performance of the system which is 
to be improved ( increased). The optimiser 60 operates 
a form of a search algorithm in order to generate a 
string of values representing the configuration 



parameters. The string of values is termed a solution 
vector since it can be considered to be an n-dimensional 
vector having values representing the configuration 
parameters. The optimiser 50 adaptively changes the . 
configuration parameters in order to optimise the cost 
value i.e. either reduce it or increase it. The search 
for a solution vector which will provide the optimum cost 
value involves the copying of a segment of a solution 
vector of random length and starting at a random position 
as an overlay for a segment of the solution vector of the 
same length at another random position. One or more of 
the values of the solution vector is then chosen randomly 
and changed. This process generates a new solution 
vector representing a new set of configuration parameters 
used by the modeller 50 . The generation of the new 
configuration parameters using this technique provides 
an improved method of searching for the optimum cost 
value . 

Figure 2 is a schematic illustration of a second 
aspect of the present invention wherein the technique of 
Figure 1 is applied to the control of the configuration 
of a physical system 70. The modeller 50 and optimiser 
60 operate as described hereinabove with reference to 
Figure 1. Once the optimiser has determined an optimum 
cost value, the configuration parameters corresponding 
to the optimum cost value are output to the physical 
system 70 in order that the system be optimumly 



configured. In order for the modeller 50 to accurately 
model the physical system 70, the physical system outputs 
operating parameters to the modeller 50. In this aspect 
of the present invention the use of the modeller 50 
enables the prediction of the outcome of changes in the 
configuration of the physical system before the changes 
are actually made to the physical system 70. In other 
words, no non-optimxim changes need be made to the 
configuration of the physical system 70 in order to try 
to achieve an optimum configuration. The determination 
of the optimum configuration is predicted using the 
model . 

The operating parameters output from the physical 
system 7 0 to the modeller 50 can comprise any measured 
data which is required for the accurate modelling of the 
physical system by the modeller 50. Such data can be 
measured from the physical system. 

Figure 3 is a schematic illustration of an aspect 
of the present invention wherein a distributed database 
is optimumly configured. This system is as described 
with reference to Figure 2 except that the physical 
system is a distributed database comprising data servers 
95 and terminals 9 0 operating clients requiring access 
to data on the databases 95 and interconnected via a 
network 80. The modeller 50 receives usage data 
indicating the pattern of usage of the data in the 
databases 95 by the client's in the terminals 90. The 



modeller 50 also requires information on communication 
speeds in order to accurately model the distributed 
database. The optimiser 60 will search for an optimum 
solution vector representing configuration parameters for 
the distributed database which indicate the optimum usage 
pattern i.e. which databases 95 the clients on the 
terminals 90 should look to for retrieval of data. if 
their usage patterns perform a repeated cycle or are 
predictable;, projected usage data can be fed into the 
model instead of retrieved from the network 80 thereby 
allowing the system to generate the new configuration 
parameters based on anticipated work load. The data 
units held in the databases 95 can comprise any form of 
information such as text^ audio, video or images, or any 
other form of database information. 

The distributed database can be provided over a 
local area network such as an ethernet or a corporate 
intranet, or over a wide area network such as the 
internet . 

The distributed database can either be a homogenous 
database wherein all of the data units are of the same 
format, or it can comprise a heterogenous database 
wherein data units are stored in different formats. In 
a heterogenous database it will be necessary to translate 
between the different formats of the data units in order 
for data to be passed between databases e.g. during data 
"mirroring" operations, and during retrieval and update 



operations if the format of the data operated on by the 
client is different to the format of data in the 
database. 

The reconfiguration of the database based on the 
configuration parameters results in a client being 
directed to a particular data seirver for data. When the 
configuration is changed the client is directed to a new 
data server, if the data server does not contain the 
required data it is necessary for the data to be copied 
from a data server containing that data. Thus, the 
reconfiguration of the distributed database comprises a 
mix of simply redirection of queries from a client to a 
database .server and the copying of data between data 
servers. Since the copying of data between data servers 
requires a high band width over the network, this 
operation is minimised where possible. One scenario, 
where the copying of data between data servers takes 
place is when a data server becomes heavily used by 
clients. The "mirroring" of the data to another data 
server will spread the load between the data servers 
thereby enhancing the performance of the distributed 
database . 

The adaption of the configuration of the distributed 
database can either take place continuously during the 
operation of the distributed database, at a period when 
the use of the distributed databases are low, or at 
predetermined times . 



The modeller 50 and optimiser 60 are preferably 
implemented as a computer program operating on a 
computer. In order to reduce the amount of processing 
that is carried out, the calculation of the optimum 
configuration parameters can be triggered to occur only 
in response to a suitable criteria such as the failure 
of a data server, a network link failure or change, or 
when the time taken by one or more of the clients to 
retrieve and/or update data units at one or more of the 
data servers exceeds a threshold- This avoids the 
adaption process continuously hunting for an optimum 
solution when the distributed database may already be 
optimumly configured or .near optimumly configured. 
Alternatively, the adaption algorithm can operate 
continuously but the configuration determined may only 
be applied if the predicted improvement in performance 
exceeds a threshold. Since the overheads associated with 
reconfiguration can be significant, a balance must be 
struck between accepting sub-optimal performance and the 
frequency of reconfiguration. 

Figure 4 illustrates a distributed database in one 
embodiment of the present invention wherein the first 
database 1 (DBl) and the second database 2 (DB2) are 
connected to a network 3 and thus form nodes in the 
network. The first database 1 is a relational database 
and includes a relational database management system 
(RDBMS) 4. The second database 2 is an object oriented 
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database and includes an object oriented database 
management system (OODBMS) 5. The database management 
systems 4 and 5 manage the respective database e,g, 
control access, security, backups etc. and operate to 
interface the databases 1 and 2 to the network 3 . 

A first network station 6 is connected to the 
network 3 and operates a first application APPl which is 
interfaced to the network 3 by an access manager AMI. 
A second work station 7 is also connected to the network 
3 and runs a second application APP2 which is interfaced 
to the network 3 via an application manager AM2 . The 
applications APPl and APP2 include performance files 8 
and 9 for the storage of the monitored performance during 
retrieval and/or updating of data to the databases 1 and 
2. The access managers AMI and AM2 include respective 
usage files 10 and 11 for storing information on the 
usage of databases 1 and 2* Also the access managers AMI 
and AM2 include translators 12 and 13 for translating 
queries and data between database formats when necessary. 

The database network also includes a controller 14 
which operates an adaptor application 15 for carrying out 
the adaption algorithm for the control of the 
configuration of the database. The adaptor application 
15 is interfaced with the network 3 via an adaptor 
application access manager AM3 . The access manager AM3 
includes a translator 16 for translating data between 
different database formats where necessary and a parser 
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for parsing generic messages to move, copy and/or delete 
data. The controller 14 also contains tables 17 which 
contain data identifying the location of data files in 
the databases, information identifying the type of format 
of the database and the location, and information 
identifying the database containing inf oinnation which may 
be accessed by each application APPl and APP2. 

Although in Figure 4 only two work stations 6 and 
7 and two databases 1 and 2 are illustrated, any number 
can be provided over the network 3 • Also only work 
stations 5 and 6 are illustrated as only operating single 
respective applications APPl and APP2, they can, in a 
multitask environment, be operating more than one 
application which can retrieve and/or update data i,e. 
the work stations 5 and 6 can host more than one client • 
The respective access manager AMI and AM2 identifies 
which of the applications in a respective work station 
6 and 7 has requested or sent data and it is routed 
accordingly • 

Within Figure 4 the adaptor application 15 and the 
table 17 are illustrated as being provided in a separate 
controller arrangement 14. However, these can be 
distributed throughout the work stations 6 and 7 as is 
illustrated in Figures 5 and 6. 

Figure 5 illustrates an embodiment wherein the work 
stations 6 and 7 include tables 17a and 17b which are 
copies of tables 17. This arrangement is advantageous 



since the access managers AMI and AM2 can access the 
tables directly without incurring a network delay • 

Figure 6 illustrates a further embodiment wherein 
the work stations 6 and 7 also run local adaptor 
applications 15a and 15b and thus the adaption algorithm 
is carried out in a distributed manner. Local parsers 
20a and 20b are provided in the respective access 
managers AMI and AM2 for parsing generic messages from 
the respective adaptor applications 15a and 15b. This 
distribution of the adaptor application will benefit 
simply from parallel processing, or the adaptor 
application 15a and 15b could operate on a localised 
basis to control distribution of data locally i.e. only 
at a few of the databases. . 

Methods of adaptively controlling the configuration 
of the distributed databases will now be described. 

Figure 7a illustrates the steps carried out when 
data is requested from a database by an application on 
a work station. In step SI the application generates a 
query for data which includes the data identification 
(ID) . In step 32 the access manager responds to the query 
and looks in the tables to determine the application 
location, the database location for the applications and 
the identity and type of database containing data. In 
step S3, based on the information from the tables, it is 
determined whether there is a difference in format 
between the format of the database and the format of the 
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application* In step S4 if there is a difference in 
foarmat the query is translated and in step S5 the data 
is obtained from the database. In step S6 the access 
module records the database usage in the usage file. If 
the format of the database is different to the format of 
the application, in step S7 it is determined whether the 
translation of the data is necessary and in step S8 the 
translation of the data takes place if necessary. In step 
S9 the data is returned to the application by the access 
manager. In step SIO the application records, in a 
performance file, how long the transaction took to 
complete which is an indication of the performance of the 
retrieval operation. 

In step Sll the adaptor application reads the 
performance and usage files and in step S12 the adaptor 
application can determine whether an adaption of the 
distributed database is necessary. This adaption can 
only take place off line, continuously in real time, only 
at predetermined times, or only when network traffic will 
permit the transference of data over the network between 
databases . 

The usage file used in step S6 contains an 
identification of the data record retrieved from the 
database, an identification of the database from which 
it was retrieved, an identification of the application 
which retrieved the data record, and the date and time 
of retrieval. The usage file can either comprise a 



single file for databases, or it can comprise a single 
file for each of the databases. This is used for 
determining an optimum configuration for the distributed 
database. 

The performance files comprise a file for each 
application. The performance files contain data 
identifying the retrieved record data, the application 
identification, the query identification, and the 
transaction duration. These are used for determining 
whether to determine an optimum configuration for the 
distributed database. 

In the embodiments described hereinabove the tables 
17, 17a and 17b can comprise a number of different tables 
containing information used for identifying the location 
of data in the distributed database. The system 
configuration table can contain data which identifies the 
database, the database type e,g. a relation database or 
an object oriented database, and the connect string used 
to connect to the database over the network- The 
advantage of using such a table is that when a new 
database is added to the distributed database network, 
its details can simply be added to this file and the 
algorithm, as will be described hereinafter, can adapt 
the network to efficiently accommodate it. 

The tables can also include a data allocation for 
each work station which records where an application on 
a network station should look for each data record. In 



these embodiments of the present invention these tables 
specify where the information can be retrieved for an 
application. 

The tables can also include a data configuration 
table which lists all of the locations of each type of 
data record . 

The table can further include an application 
configuration table which is used by the access manager 
so that it can determine where each application is 
running and so which data allocation to use. The table 
thus containing the application identification and the 
identification of the work station on which it is 
running . 

Figure 7b is a flow diagram of steps carried out for 
an update. In step SlOO the application generates an 
update query and identifies the data type which is to be 
updated* The access manager receives the query and data 
from the application in step SlOl and looks in the tables 
to determine the application location, the database 
locations containing the data type and the identity of 
the databases containing the type of data. In step S102 
it is determined whether it is necessary to translate the 
update query and the data from one database to another 
depending upon the database format of the data generated 
by the application and the data format of the type 
database to be updated. If the translation between the 
database format is required, in step S103 the update 



query and data are translated. In step S104 the data is 
then transferred to the databases containing the data 
type to update the data therein. In step S105 the access 
manager records the database usage in the usage file and 
in step S106 the database management system returns a 
configuration of the update of the application. The 
application then records the perfo3cmance in the 
performance file in step S107 and in step S108 the 
adaptor application reads the performance and usage file. 
If the adaptor application determine that the performance 
of the distributed database can be improved, in step SI 09 
the adaptor application adapts the distributed database. 

Thus in accordance with the update method^ all of 
the database types which corresponds to the type of data 
generated by the application are updated for subsequent 
access by any other application in the distributed 
database. 

The adaptor algorithm in accordance with one 
embodiment of the present invention will now be 
described • 

Applications access data either on a local . server 
or, via a communications link, on a remote server. The 
best location of data depends upon the concentrations of 
demand and the relative performance cost of each server 
and the associated communications costs. An optimum 
configuration may require data replication so that the 
overall demand for given data can be shared across 
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multiple servers and so reduce load and performance costs 
for a given server. Whilst the distribution of data 
across multiple servers is desirable and advantageous for 
data retrieval^ this poses difficulties for updating data 
since data must be updated at all servers. The algorithm 
aims to compute an improved cost solution to cause a 
reconfiguration of the distributed database when overall 
performance figures have exceeded an undesirable 
threshold. Other factors can trigger the adaption such 
as the failure or anticipated shut down of one or more 
coimnunication links or databases, or communication costs 
for a coimnunication link exceeding a threshold. 

The following algorithm, in accordance with one 
embodiment of the present invention, computes costs into 
in terms of response times i.e. the time between 
initiating the request for data and the data being 
received . 

The steps of the adaptation method of one embodiment 
of the present invention will now be described with 
reference to Figure 8 . 

In step S20 the adaption algorithm for the adaptor 
application is triggered into operation by a threshold 
decrease in performance being exceeded. In step S21 the 
adaptor application calculates the optimum distribution 
of data of the network and the optimum application to 
data links taking into consideration the model parameters 
as will be described hereinafter. In step S22 the 



adaptor application determines whether there are 
differences in the optimum distribution data and the 
current distribution of data and generates messages to 
copy/move and/or delete data at database nodes in the 
network and to update links either define the nodes which 
are to receive queries from the applications i»e. where 
an application is to receive data. The adaptor 

application access manager receives the messages in step 
S23 and parses each message. In step S24 the adaptor 
application access manager uses the tables, in 
particular, the configuration table, to identify the 
database locations and types and send instructions to 
database drivers . 

In step S25 the database drivers copy, move and/or 
delete data in accordance with the instructions and the 
adaptor application access manager translates data if the 
data is copied or moved from one database type to 
another. When all the data has been moved, copied and/or 
deleted, in step S26 the adaptor application access 
manager informs the adaptor application that all 
instructions have been carried out. In step S27 the 
adaptor application updates the tables and indicates any 
new locations of the data records in the network (i.e. 
updates the data configuration file) and updates the 
allocation file to specify where each application can be 
find appropriate data in databases in the distributed 
databases . 



If data has to be moved from one database to another 
in order to increase performance, instead of moving the 
data;, the data could be copied across one database to the 
other and subsequently deleted from the original 
database. This operation ensures that a copy of the data 
is always available for access. If the data is simply 
moved, the data may not be available during transit from 
one database to another. Further, if the movement of 
data is divided into a two stage copy and delete process, 
not all of the allocation tables need to be updated 
simultaneously. Before the deletion of data, some 
applications can still access data from the old locations 
until their data allocation tables have been updated to 
indicate the new locations. Thus dividing the move 
operation into a copy and delete two stage operation may 
have advantages in a system which operates a real time 
adaptive algorithm. 

Although the adaption method of Figure 8 has been 
described as including the moving of data, to optimise 
the database configuration it is not always necessary to 
move data between servers. It may be possible to improve 
performance simply by changing the servers to which 
queries are sent by one or more of the applications . For 
example, where data is present for an application on two 
servers, and the one currently receiving the queries for 
retrieving data is under a heavy load, the adaption 
method can simply cause the application query to be 
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directed to the other server which is less heavily 
loaded . 

Specific adaption algorithms for determining a data 
distribution and the data servers to which application 
queries are sent and which can be used to adapt the 
configuration of the distributed database to improve 
performance (access time) will now be described, 

The methods use tables of figures which represent 
possible client server connections and related 
transaction rates and performance figures. These are 
modelling parameters which represent a model of the 
distributed database. The performance figures represent 
costs that are likely to be incurred for a given 
configuration and are derived from measured values for 
similar transaction types to the ones being considered. 

A model of the distributed database used in the 
following embodiments includes as parameters forms of 
both data servers and coimuunications networks so that the 
performance of the system seen by individual client 
applications can be estimated for the current load 
conditions over a range of different access and server 
configurations. The choice of configuration is 

determined by an optimisation algorithm which produces 
solution vectors defining, for each client, which server 
that client should currently connect to for read access 
(update access must be made to all servers maintaining 
a copy of the database). 
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The algorithm derives costs from retrieval and 
update rates which can be obtained from the usage files 
in the embodiments described hereinabove and the degree 
of contention between retrieval and update transactions, 
as well as contention between clients on the same server 
node. In the algorithm it is assumed that the server and 
communications performance can be based on an MMl queue. 
This is based on random transaction arrival rates, 
exponential service times and the assumption of a single 
server per client. The principal equations used in the 
algorithms are : 

Response time = 1/ ( (1/BTT)-(1/TIT) ) (1) 

where: BTT is the Base Transaction Time, and 

TIT is the Transaction Interarrival Time 

Overall Transaction Rate (OTR) = ((sum of Retrieval 
and Update Transaction Rates - Maximum Transaction Rate) 
X contention value) + Maximum Transaction Rate..., (2) 

Thus if the contention value equals 100%, then 
Overall Transaction Rate equals sum of Transaction Rates. 

If the contention value equals zero, the Overall 
Transaction Rate equals the Maximum Transaction Rate. 

For all the contention values, the Overall 
Transaction Rate will be between the two extremes- 



Figure 9 is a schematic illustration of a first 
database scenario (A) wherein the distributed database 
comprises 10 databases 100b (A to J) and 10 clients 100a 
(1 to 10) connected over a network 200 wherein the 
communication speeds over the network are uniform. Each 
client 100a resides with a respective database 100b at 
a node 100 of the network 200- Thus each client can 
access its respective database without incurring a 
network delay. 

The configuration of the distributed database can 
be described as a solution vector an example of which is 
illustrated in Figure 10. The solution vector comprises 
a string of numbers indexed by client and containing an 
identification of the server to which each client should 
access to read data. In this illustrated example it can 
be seen that a server may be used by more than one client 
e.g. clients 1 and 8 access server B and clients 3 and 
9 access server A. 

Two genetic algorithm methods will now be described 
for determining the optimiim solution vector for the 
scenario illustrated in Figure 9. 

Tables 1 , 2 and 3 illustrate the modelling 
parameters used in these two genetic algorithm methods. 
The algorithm operates for a distributed database as 
illustrated in Figure 9 i-e. 10 clients and 10 servers 
and a server contention overlap of 10% is assumed. 
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TABLE 1 

Server Base Transaction Times (BTT) per node (in 
milliseconds) for database scenario A: 



SERVER 


BTT 


1 


>3000 


2 


>2800 


3 


>3100 


4 


>2900 


5 


>4000 


5 


>5000 


7 


>3500 


8 


>4000 


9 


>4500 


10 


>3750 



TABLE 2 

Client Application rates per node: Retrieval Rate (RR) , 
Update Rate (UR) , Overlap % (in events per second) for 
database scenario A 



CLIENT 


RR 


UR 


OVERLAP 


1 


>0.1 


0.01 


20 


2 


>0.2 


0.01 


20 


3 


>0.1 


0.01 


20 


4 


>0.1 


0.01 


20 


5 


>0.15 


0.01 


20 


6 


>0.1 


0.01 


20 


7 


>0.1 


0.01 


20 


8 


>0.15 


0.02 


25 


9 


>0 . 1 


0.01 


15 


10 


>0.1 


0.02 


15 



TABLE 3 

Base Coinms Table (time is milliseconds) for database 
scenario A 



Server/ 
cnent 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 


400 


1000 


2000 


1000 


1500 


1200 


1600 


1700 


1650 


1800 


2 


1000 


400 


2000 


2000 


1800 


1900 


1750 


1650 


1800 


1900 


3 


1000 


800 


400 


2000 


1750 


1600 


1670 


1760 


2000 


1900 


4 


800 


1200 


2000 


400 


1850 


1900 


1950 


1850 


1900 


2000 


5 


900 


1100 


1900 


1700 


400 


1800 


1600 


1760 


1540 


1860 


6 


950 


1050 


1950 


1800 


1600 


400 


1550 


1750 


1750 


1800 


7 


1000 


1100 


1900 


1500 


1650 


1800 


400 


1700 


1700 


1800 


8 


900 


950 


1800 


1900 


1800 


1500 


1700 


400 


1900 


1950 


9 


850 


1050 


1850 


1950 


1750 


1700 


1650 


1700 


400 


1900 


10 


850 


1000 


1800 


1900 


1800 


1750 


1700 


1650 


1870 


400 



Figure 11 illustrates a method of a first embodiment 
of the present invention for modelling the distributed 
database using a proposed solution vector in order to- 
generate the cost value which,, for this embodiment, 
comprises a worst retrieval rate for the database. 

In step S40 the effective transaction rate (ETR) is 
calculated for each client by combining the retrieval and 
update rates, using the percentage overlap and equation 
2 given hereinabove. In step S41 a 2-dimensional array 
L is populated with the update rate calculated for that 
client and indexed by client and server number. Where 
a server is to be used by a particular client as 
indicated in the input current proposed solution from the 
genetic algorithm (step S43), in step S42 the 



corresponding array entry in L is replaced with that 
client's effective transaction rate. A 2-dimensional 
array C is then created in step S44 to hold effective 
communication overhead delays. For each client/server 
interaction rate, now defined in the corresponding 
entry in C is calculated using equation 1 given 
hereinabove. In step S45, for each server, the client 
entries in L are used to calculate aggregate loading from 
all clients of that server using equation 2 given 
hereinabove. In step S46 this is then converted to an 
average response time using equation 1 given hereinabove 
and this is stored in a 1-dimensional array T indexed by 
server. Any calculated infinite response times are 
replaced with a suitably large response time. In step 
S47, for each server, the client with the worst 
communications overhead delay is found from C and this 
is added to the relevant entry in T. In step S48 the 
largest value T is then output and this represents the 
worst performance of this database configuration. 

A first algorithm for generating proposed solution 
vectors will now be described herein after with reference 
to Figure 12. This flow diagram illustrates the steps 
of a Breeder genetic algorithm. 

In step S50 an initial random population P is 
created using a non-binary representation. Each gene 
position corresponds to a client node and the allele 
indicates which server that client is to be used for 
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retrieval access. The maximuin number of generations G 
to be allowed is calculated in step S51 from the 
following equation: 

5 G = 5000 / ({population size/2) + l) (3) 

In step S52 all the members of the population are 
then evaluated using the method of Figure 11. In step 
S53 g is set to 0. In step S54 the current generation 

10 number g is incremented by 1 and a loop in the algorithm 
is entered ♦ All of the numbers of the population are 
sorted in step S55 based on the evaluation result such 
that the lowest result is sorted to the top i,e, is the 
best. The bottom half of the population is then deleted 

15 in step S55 and thus the current population p is set to 
equal half of the total population P. In step S57 the 
current population p is incremented by 1 and in step S58 
two members from the top half of the population are 
chosen at random and a new n\imber is generated using the 

20 technique which will be described hereinafter with 
reference to Figures 14 and 15. In step S59 using 
uniformly distributed allele replacement each gene is 
mutated in the new member based on the defined percentage 
chance of mutation. In step S60 the new member is 

25 evaluated using the procedure of Figure 11 and this is 
added to the bottom of the population list. In step S61 
it is then determined whether the original population 
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size had been restored i.e. p = P and if not the process 
returns to step S57 . If the original population size P 
has been restored the process proceeds to step S62 
whereupon it is determined whether the maximum number of 
generations G has been reached i.e. g = G. Ifg^^^G the 
process returns to step S54. If g = G the process 
proceeds to step S63 where all the members of the 
population are sorted based on the evaluation results 
from the lowest result and best. In step S64 the member 
of the population with the lowest evaluation result is 
entered. This can then be used for determining the 
configuration of the distributed database. 

Figure 13 illustrates an alternative method which 
implements a single three way Tournament genetic 
algorithm. In step S70 an initial random population is 
created using a non-binary representation. Each gene 
position corresponds to a client node and the allele 
indicates which server that client is to use for 
retrieval access. In step S71 the maximum number of 
additional evaluations A to be allowed is calculated 
using the following equation: 



A = 5000 - population size. 



(4) 



In step S72 all the members of the population are 
evaluated using the method of Figure 11. In step S73 the 
current evaluation a is set to 0 and in step S74 the 
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current evaluation A is incremented to enter a loop in 
the algorithm. Three members of the population are then 
chosen at random in step S75 and in step S7 6 these are 
sorted into BEST, SECOND and WORST. A new member is 
created in step S77 from BEST and SECOND using the 
technique described hereinafter with reference to Figures 
14 and 15- Each gene in this new member is mutated in 
step S7 8 based on the defined percentage chance of 
mutation using the uniformly distributed allele 
replacement. A new member is then evaluated in step S79 
and in step S80 the new member is used to replace the 
WORST- In step S81 it is then determined whether the 
number of additional evaluations has been reached i.e. 
a = A and if not the process returns to step S74. If the 
number of additional evaluations has been reached^ the 
process proceeds to step S82 where all the members of the 
population are sorted based on the evaluation result with 
the lowest result as BEST. In step S83 the member of the 
population of the lowest evaluation result is then 
output. Details on this output member of the population 
can then be used for the configuration of the distributed 
database in order to improve the system performance. 

Although in both genetic algorithms described above 
5000 evaluations are used, any suitable number can be 
used. Mutation rate and population size can be 
appropriately selected to tune the genetic algorithm. 



For example the mutation rate of 14% can be chosen and 
the population size of anything from 5 to 500. 

The method of generating the new member in Figures 
12 and 13 will now be described with reference to Figures 
14 and 15. 

Using the two parents, in step S90 an initial child 
is generated as an exact copy of parent 2. A portion of 
parent 1 of random length and at a random position is 
then selected in step S91 i.e. length = 5 and position 
= 8 in this example. This overlay portion is then 
overlaid onto a portion of the initial child of the same 
length at another random position in step S92 (i.e. at 
position = 4 in this example) to generate the resulting 
child as illustrated in Figure 15. 

This technique is a variant of a two-point crossover 
technique which causes skewing. In this technique allele 
values in the child are directly overwritten by the 
overlay portion. There is no splicing and shunting of 
the genes . 

This techniques is not only directly applicable to 
allelic representations in which the allowed allele range 
is the same for each gene, where allelic representations 
are not of the same range, mechanisms can be applied to 
align the allelic representations. 

Figure 16 is a schematic illustration of a second 
distributed database scenario (B) wherein the nodes 400 
and 500 are effectively split into two geographic regions 
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having low coranunications costs between nodes in the same 
region, and high costs between regions. As can be seen 
in the left hand side in Figure 16 the four nodes 400 
comprising data servers 400a (A to D) and clients 400b 
5 ( 1 to 4 ) are interconnected via a communications network 
600 of high speed. Similarly jr in the right hand side the 
six nodes 500 comprising data servers 500a (F to J) and 
clients 500b (5 to 10) are interconnected via a 
coimnunications network 700 of high speed. The two high 

10 speed communication networks 6 00 and 700 are 
interconnected via a low speed communications line 300. 
Each of the local coimnunications network have a 
"supernode" whose data performance is 10 times that of 
the other nodes in the region. 

15 Tables 4 to 6 below illustrate the modelling 

parameters used in the genetic algorithm methods for this 
scenario. A server contention overlap of 10% is assumed 
as for the first scenario (A) . 
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Table 4 

Base Comms Table (time in msec) 
for Scenario B 



Server 
/client 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 


400 


800 


800 


800 


3000 


3000 


3000 


3000 


3000 


3000 


2 


800 


400 


800 


800 


3000 


3000 


3000 


3000 


3000 


3000 


3 


800 


800 


400 


800 


3000 


3000 


3000 


3000 


3000 


3000 


4 


800 


800 


800 


400 


3000 


3000 


3000 


3000 


3000 


3000 


5 


3000 


3000 


3000 


3000 


400 


1000 


1000 


1000 


1000 


1000 


6 


3000 


3000 


3000 


3000 


1000 


400 


1000 


1000 


1000 


1000 


7 


3000 


3000 


3000 


3000 


1000 


1000 


400 


1000 


1000 


1000 


8 


3000 


3000 


3000 


3000 


1CXX) 


1000 


1000 


400 


1000 


1000 


9 


3000 


3000 


3000 


3000 


1000 


1000 


1000 


1000 


400 


1000 


10 


3000 


3000 


3000 


3000 


1000 


1000 


1000 


1000 


1000 


400 



Table 5 

Client application rates per node: Retrieval Rate (RR) 

Update Rate (UR) Overlap (%) (in events per second 1 

for database Scenario B 



Client 


RR 


UR 


Overlap ( % ) 


1 


> 0.2 


0.005 


20 


2 


> 0.2 


0.005 


20 


3 


> 0.05 


0.02 


5 


4 


> 0.25 


0.005 


20 


5 


> 0.2 


0.005 


20 


6 


> 0.1 


0.005 


20 


7 


> 0.1 


0.005 


20 


8 


> 0.25 


0.005 


20 


9 


> 0.05 


0.01 


5 


10 


> 0.1 


0.005 


20 
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Table 6 

Server Base Translation Time (BTT^ per 
Node (in msec) for database scenario B 



Server 


BTT 


1 


5000 


2 


5000 


3 


500 


4 


5000 


5 


5000 


6 


5000 


7 


5000 


8 


5000 


9 


500 


10 


5000 



Figures 17 to 22 illustrate methods of deteinuining 
an alternative cost value to the worst performance the 
cost value as determined in the method described with 
reference to Figure 11. 



5 The method of Figure 17 is termed "just used" 

wherein the worst performance is output only for the 
servers which appear in the solution vector generated by 
the genetic algorithm. 

In step SllO the Effective Transaction Rate (ETR) 

0 is calculated for each client by combining the retrieval 
and update rates, using the percentage overlap and 
equation 2 given hereinabove . In step Sill a two 
dimensional array L is populated with the update rate 
calculated for that client and indexed by client and 

5 server number. Where a server is to be used by a 
particular client as indicated in the input current 
proposed solution from the genetic algorithm (S113), in 
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step S112 with corresponding array entry in L is replaced 
with that clients Effective Transaction Rate. In step 
S114 the columns in the L array for servers which do not 
appear in the solution vector are then zeroed to remove 
their effect. A two dimensional array C is then created 
in step S115 to hold effective communication overhead 
delays. For each client/server interaction rate, now 
defined in L the corresponding entry in C is calculated 
using equation 1 given hereinabove. In step S116 the 
columns in the C array for servers which do not appear 
in the solution vector are then zeroed to remove their 
effect. In step SI 17 for each server the client entries 
in Ij are used to calculate aggregate loading for all 
clients on that server using equation 2 given 
hereinabove. In step S118 this is then converted to an 
average response time using equation 1 given hereinabove 
and this is stored in a one dimensional array T indexed 
by server. Any calculated infinite response times are 
replaced with a suitably large response time. In step 
S119, for each server which is represented in the 
solution vector, the client with the worst communication 
overhead delays is found from C and this is added to the 
relevant entry in T. In step S120 the largest value in 
T is then output and this represents the worst 
performance of the servers which are to be used by the 
clients in this database configuration. 



Figure 18 illustrates another embodiment termed 
"plus average" in which 10% of the access times for all 
nodes is added to the least worst server time. This 
strikes a balance between minimising the worst 
performance and aggregate server performance. 

In step S130 the Effective Transaction Rate (ETR) 
is calculated for each client by comparing the retrieval 
and update rates, using the percentage overlap and 
equation given hereinabove. In step S131 a two 
dimensional array L is populated with the update rate 
calculated for that client and indexed by client and 
server number. Where a server is to be used by a 
particular client as indicated in the input current 
proposed solution from the genetic algorithm (S133), in 
step 132 the corresponding array entry in L is replaced 
with that clients Effective Transaction Rate. A two 
dimensional array C is then created in step SI 34 to hold 
effective communication overhead delays. For each 
client/server interaction rate^. now defined in the 
corresponding entry in C is calculated using equation 1 
given hereinabove. In step SI 35, for each server, the 
client entries I, are used to calculate aggregate loading 
for all clients on that server using equation 2 given 
hereinabove. In step S136 this is then converted into 
an average response time using equation 1 given 
hereinabove and this is stored in a one dimensional array 
T indexed by server. Any calculated infinite response 



times are replaced with a suitably large response time. 
In step S137, for each server the client with the worst 
communications overhead delay is found from C and this 
is added to the relevant entry in T. In step SI 3 8 10% 
of the average of all the values in T are calculated and 
in step SI 39 the largest value in T plus the 10% of the 
average of all of the average values in T is output as 
the performance measure for this database configuration. 

Figure 19 is a flow diagram of another method of 
calculating a cost value for the distributed database. 
This model considers applying updates only to those nodes 
that are currently being accessed as servers, and adds 
10% of the communications access time seen by all clients 
on the worst server, divided by the number of servers 
used. This adds a bias based on aggregate user 
perception but with weighting in favour of over 
duplication of data. This method can provide enhanced 
resilience and is referred to as "plus used". 

In step S140 the Effective Transaction Rate (ETR) 
is calculated for each client by combining the retrieval 
and update rates, using the percentage overlap and 
equation 2 given hereinabove. In step S141 a two 
dimensional array L is populated with the update rate 
calculated for that client and indexed by client and 
server number. Where a server is to be used by a 
particular client as indicated in the input current 
proposed solution from the genetic algorithm (S143), in 



step S142 the corresponding array entry in L is replaced 
with that clients Effective Transaction Rate (ETR). In 
step S144 the columns in the L array for the servers 
which do not appear in the solution vector are then 
zeroed to remove their effect. In step S145 a two 
dimensional array C is then created to hold effective 
coimnuni cation overhead delays. For each client/ server 
interaction rate, now defined in I* a corresponding entry 
in C is calculated using equation 1 given hereinabove. 
In step S146 the columns in the C array for servers which 
do not appear in the solution vector is zeroed and in 
step S147 for each server which appears in the solution 
vector the client entries in L are used to calculate 
aggregate loading for all clients on that server using 
equation 2 given hereinabove. In step S148 this is then 
converted to an average response time using equation 1 
given hereinabove and this is stored in a one dimensional 
array T indexed by server. Any calculated infinite 
response times are replaced with a suitably large 
response time. In step S152 for each server which is 
represented in the solution vector the client with the 
worst communications overhead delay is found from C and 
this is added to the relevant entry in T. Also in step 
S14 9 the worst server in T the average communication 
times in C are calculated and in step S150 10% of the 
average is taken. This is then divided by the number of 
different servers appearing in the solution vector in 
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step S151 and in step S153 the largest value in T plus 
the value calculated in step S151 is output as the 
performance value for the database configuration. 

Figure 20 is a flow diagram of another method of 
5 calculating a cost value for the distributed database. 
This method is a combination of "just used" and "plus 
average" . 

In step S160 the Effective Transaction Rate (ETR) 
is calculated for each client by combining the retrieval 

10 and update rates, using the percentage overlap and 
equation 2 given hereinabove • In step S161 a two 
dimensional array L is populated with the update rate 
calculated for that client and indexed by client and 
server number. Where a server is to be used by a 

15 particular client as indicated in the input current 
proposed solution from the genetic algorithm (S163), in 
step S162 the corresponding array entry in L is replaced 
by that clients Effective Transaction Rate (ETR). The 
columns in the L array for servers which do not appear 

20 in the solution vector are then zeroed in step SI 64 and 
in step S165 a two dimensional array C is then created 
to hold effective communication overhead delays. For 
each client/server interaction rate now defined in L the 
corresponding entry in C is calculated in equation 1 

25 given hereinabove. The columns in the C array for 
servers which do not appear in the solution vector are 
then zeroed in step S166 and in step S167 for each used 
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server the client entries in L are used to calculate 
aggregate loading for all clients on that server using 
equation 2 given hereinabove. In step S168 this is then 
converted to an average response time using equation 1 
given hereinabove and this is stored in a one dimensional 
array T indexed by seirver. Any calculated infinite 
response times are replaced with a suitably large 
response time. In step S169, for each server which is 
represented in the solution vector, the client with the 
worst overhead communication delay is found from C and 
this is added to the relevant entry in T. In step S170 
10% of the average of all of the values in T are 
calculated and in step S171 the largest value in T plus 
the 10% value calculated of the average of all of the 
15 values in T is output as the performance value for the 
database . 

Figure 21 is a flow diagram illustrating a further 
method of calculating a cost value. In this method only 
used servers i.e. servers which appear in the solution 
vector the average of all client accesses weighted by 
their usage rate is added to all used servers. This 
technique is termed "plus all" and is thought to be the 
most realistic in terms of representing user perception 
of quality of service of the distributed database. 

In step S180 the Effective Transaction Rate (ETR) 
IS calculated for each client by combining the retrieval 
and update rates, using the percentage overlap and 
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equation 2 given hereinabove. In step S181 a two 
dimensional array L is populated with the update rate 
calculated for that client and indexed by the client and 
server number. Where a server is to be used by a 
particular client as indicated in the input current 
proposed solution from the genetic algorithm (8183), in 
step S182 the corresponding array entry in L is replaced 
by that clients Effective Transaction Rate (ETR). The 
columns in the L array for servers which do not appear 
in the solution vector are then zeroed in step S184. A 
two dimensional array C is then created in step S185 to 
hold effective communication overhead delays. For each 
client/server interaction rate^ now defined in L the 
corresponding entry in C is calculated using equation 1 
given hereinabove. The columns in the C array for 
servers which do not appear in the solution vector is 
then zeroed in step S186. The process then diverges to 
step S190 in which, for all used servers, the average of 
the communications times in C is calculated weighted by 
how often the communication links are used* Also in step 

5187 for each used server the client entries in L are 
used to calculate aggregate loading from all clients on 
that server using equation 2 given hereinabove. In step 

5188 this is then converted to an average response time 
using equation 1 given hereinabove and this is stored in 
a one dimensional array T indexed by server. Any 
calculated infinite response times are replaced with a 



suitably large response time. In step S189 for each 
server which is represented in the solution vector the 
client with the worst communications overhead delay is 
found from C and this is added to the relevant entry in 
T. In step S191 the largest value in T plus the value 
calculated in step S190 are output as the performance 
value for the database. 

Figure 22 is a flow diagram of a further method of 
calculating cost value. This method is similar to the 
"plus all" technique except that only 10% of the average 
of all clients accessed weighted by their usage rate is 
added* This technique is termed "plus 10%". 

In step S200 the Effective Transaction Rate (ETR) 
is calculated for each client by combining the retrieval 
and update rates , using the percentage overlap and 
equation 2 given hereinabove. In step S201 a two 
dimensional array L is populated with the update rate 
calculated for that client and indexed by client and 
server number. Where a server is to be used by a 
particular client as indicated in the input current 
proposed solution from the genetic algorithm {S203), in 
step S202 the corresponding array entry in L is replaced 
with that clients Effective Transaction Rate. In step 
S204 the columns in the L array for servers which do not 
appear in the solution vector are zeroed. A two 
dimensional array C is then created in step S205 to hold 
effective communication overhead delays. For each 
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client/server interaction rate, now defined in the 
corresponding entry in C is calculated using equation 1 
given hereinabove. In step S206 the columns in the C 
array for servers which do not appear in the solution 
vector are then zeroed and the process diverges. In step 
S210 for all used servers (i.e. the servers which appear 
in the solution vector) the average of the coimnuni cations 
times in C weighted by how often the communication links 
are used are calculated. Also in step S207 for each 
server used, the client entries in L are used to 
calculate aggregate loading from all clients on that 
server using equation 2 given hereinabove. In step S208 
this is then converted to an average response time using 
equation 1 given hereinabove and this is stored in a one 
dimensional array T indexed by server. Any calculated 
infinite response times are replaced by a suitably large 
response time. In step S209 for each server which is 
represented in the solution vector the client with the 
worst communications overhead delay is found from C and 
this added to the relevant entry in T. In step S211 the 
largest value in T plus 10% of the value calculated in 
step S210 is then output as the performance value for 
this database configuration. 

So far the techniques described for obtaining the 
new solution vectors for use in the evaluation methods 
of Figures 11 and 17 to 25 have been evolutionary 
techniques wherein a genetic algorithm is used and thus 



the technique of Figures 13 and 15 is used with two 
parent solution vectors. The present invention is not 
however limited to evolutionary techniques and is 
applicable to any search technique which can operate on 
a solution vector. Two non evolutionary techniques will 
now be described. 

Figure 23 is a flow diagram of a Hill Climbing 
search technique wherein only one parent solution vector 
is used. In this non-evolutionary technique there is no 
"population" . 

In step S220 a random solution vector is generated 
initially and this becomes the current solution vector. 
In step S221 an iteration counter m is set to zero. In 
step S222 the current solution vector (a randomly 
generated solution vector) is evaluated using the methods 
of any one of Figures 11 and 17 to 22 to return a fitness 
value. The process then enters an iterative loop where 
in step S223 the iteration counter m is incremented. In 
step S224 it is determined whether the number of 
iterations have been complete i.e. m = M where M is the 
total ntimber of iterations to be carried out. If the 
number of iterations has been complete in step S225 the 
current solution vector is output as the optimum solution 
vector. If m ^ in step S226 a mutant solution vector 
is created from the current solution vector as will be 
described in more detail with reference to Figures 25 and 
26. In step S227 the mutant solution vector is evaluated 
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to return a fitness value. The fitness value for the 
mutant solution vector is then compared with the fitness 
value for the current solution vector and if it is better 
then in step S229 the solution vector for the mutant is 
set as the current solution vector and the process 
returns to step S223- If not the current solution vector 
is kept and the mutant solution vector is discarded and 
the process returns to step S223, 

In accordance with this technique only improvements 
to the current solution vector are kept. This technique, 
although simple^ suffers from the disadvantage that if 
there is a localised minimum (or maximum if a maximum is 
to be found) in the search space, the Hill Climbing 
technique can determine the optimum solution to lie at 
the localised minimum rather than at the global minimum- 
Figure 24 is a flow diagram of another non- 
evolutionary search technique termed Simulated Annealing. 

In step S230 a random solution vector is generated 
to become the current solution vector- In step S231 the 
iteration counter m is set to 0 . In step S232 the current 
solution vector is evaluated using the techniques of any 
one of Figures 11 and 17 to 22 in order to return a 
fitness value. In step S233 the process then enters an 
iteration loop wherein the iteration counter m is 
incremented. In step S234 it is then determined whether 
the iteration loop is complete i.e. m = M, where M is the 
total number of iterations to be carried out. If so, in 



step S235 the current solution vector is output as the 
optimum solution vector. If not in step S236 a mutant 
solution vector is created from the current solution 
vector as will be described in more detail hereinafter 
with references to Figures 25 and 26. In step S237 the 
mutant solution vector is then evaluated to return a 
fitness value* In step S238 it is determined whether the 
fitness value for the mutant solution vector is better 
than the fitness value for the current solution vector. 
If it is^ in step S239, the solution vector for the 
mutant is set as the current solution and the process 
returns to step S233. If it is not^ in step S240 the 
following calculation is made: 

d = e ^ 

(5) 

where f^, is fitness of current best solution, 
f^ is fitness of the mutant, 

and t is a "temperature" which starts high e.g. 10^ 
and decays by a geometric cooling schedule* 

In step S241 a random number n between 0 and 1 is 
then generated and in step S24 2 it is then determined 
whether n >d. If not in step S243 the solution vector for 
the mutant is set as the current solution vector and the 
process returns to step S233. If so the process returns 
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to step S233 retaining the current solution vector and 
discarding the mutant solution vector. 

The Simulated Annealing technique has the advantage 
over the Hill Climbing technique in that a worst solution 
5 can be accepted in the search procedure thereby allowing 
the search process to escape from localised minimum in 
the search for the global minimum. 

The technique for generating the mutant solution 
vector will now be described with reference to Figures 
10 25 and 25* 

In step S301 for the single parent an initial child 
is generated as an exact copy of the parent. In step 
S302 from a random start position a portion of random 
length is selected from the parent as an overlay portion* 
15 In Figure 26 it can be seen that the initial start 
position is 8 and the length of the overlay portion is 
5. In step S303 the portion is then overlaid in the 
initial child on a portion of the initial child of the 
same length at a random start position to generate an 
20 intejonediate child. As can be seen in Figure 26 the 
random overlay position is 4 . In step S304 a value in 
the child is ther\ randomly mutated to generate a mutant 
solution vector to generate a resulting child. In Figure 
26 the first value in the string is changed from A to G. 

25 

The use of the technique of the present invention 
in conjunction with a Tournament genetic algorithm has 



been evaluated in comparison with a conventional Hill 
Climbing technique, a Simulated Annealing technique, a 
Breeder Genetic Algorithm technique using conventional 
uniform cross over to generate a new solution vector and 
a Tournament genetic algorithm also using the uniform 
cross over technique to generate the new solution vector. 
Results have been obtained for scenario A as illustrated 
in Figure 9 and scenario B as illustrated in Figure 16- 
Figure 27 shows the results for the five optimisers 
for scenario A with the basic "least worst" server model. 
For each of 1000 runs, the fitness of the best solution 
was noted. The diagram firstly shows the percentage of 
these 1000 runs that found the known globally optimum 
solution value (referred to as "On Target" results). The 
second group of columns shows the percentage of times 
that the fitness of the best found solution was within 
5% of the known globally optimal solution (acceptable in 
industrial context). The third set of columns shows the 
percentage of times that the fitness of the best found 
solution was more than 30% greater (given that lowest 
fitness value here is best) than the known globally 
optimal solution (deemed unacceptable in an industrial 
context). In Figure 27 to 29, the following 

abbreviations have been used: 

HC - Hill Climbing technique 

SA - Simulated Annealing technique 
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BDR - Breeder genetic algorithm technique using 

uniform cross over 
TNT - Tournament genetic algorithm technique 

using unifoarm cross-over 
SKT - Tournament genetic algorithm using the 
technique of the present invention for 
generating a new solution vector. 
As can be seen in Figure 27 whilst the tournament 
genetic algorithm using the technique of the present 
invention is well behind at 1000 evaluations, it still 
gives good results at 5000 evaluations. 

Figure 28 shows the results for scenario A using the 
"plus all" model variant (seen as more indicative of user 
perceived quality of service). Once again, although the 
15 tournament genetic algorithm technique employing the 
technique of the present invention to generate a new 
solution vector gives unimpressive results at 1000 
evaluations, good results were obtained at 5000 
evaluations . 

Figure 29 shows results for scenario B with the 
basic, least worst server model. Once again although the 
tournament genetic algorithm employing the technique of 
the present invention to generate the new solution vector 
is unimpressive at one thousand evaluations, the results 
25 are good for five thousand evaluations. 

Figure 30 shows the results for scenario B with the 
"plus all" model. This problem clearly gives most 
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optimisers a large amount of difficulty. At 1000 
evaluations, performance is wholly acceptable* However 
the tournament genetic algorithm technique utilising the 
technique of the present invention to generate the new 
solution vector provides goods results 5000 evaluations. 
Thus as can be seen in Figures 27 to 30, this technique 
is the only one which is able to give any degree of 
consistent perf ojonance which is critical in an industrial 
context . 

Although the present invention has been described 
hereinabove with reference to its application to the 
optimization of the configuration of a distributed 
database, the technique of the present invention is 
widely applicable to the optimization of any physical 
system. The technique for generating a new solution 
vector exploits a particular feature of the chosen 
representation namely that good contiguous chunks of 
allele values in one part of the chromosome may also work 
well as a building block elsewhere in the chromosome. 

The present invention can be applied to the 
optimisation of a distributed processing system of which 
a distributed data base is one example. In such a system 
applications (clients) requiring processing to be carried 
out at nodes (servers) in a network refer their 
processing requests to particular nodes. It is this 
configuration which can be adapted using the present 
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invention in order to distribute the processing load to 
improve the system performance. In the system the 
operating parameters can comprise transaction or 
processing rates and network communication times can be 
used in the model as for the distributed data base 
embodiment . 

The technique has also been applied to a hard 
benchmark problem in structural chemistry involving 
finding the two dimensional structure which minimizes the 
energy (as estimated by Lennard- Jones potentials) of a 
bonded string of atoms , Here the chromosome representis 
a list of adjacent bond angles and the true optimum in 
each case is a close-to spiral structure in which the 
adjacent angles are similar to each other (but not 
exactly the same), with repeated contiguous patterns of 
angles along the string • Coded as a maximisation 
problem, tables 7 and 8 below indicate the results over 
ten trials for both ten and twenty atom cases for a 
tournament genetic algorithm using one point (1 pt) two 
point (2 pt), uniform (unif), and the inventive cross- 
over technique (skew). 
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Table 7 





10 atoms: 


1 pt: 


20 


19 


19 


19 


19 


19 


19 


19 


19 


19 


2pt: 


19 


18 


19 


18 


20 


19 


18 


19 


19 


19 


unrf: 


19 


20 


19 


17 


19 


18 


18 


19 


19 


20 


skew: 


20 


20 


20 


20 


20 


20 


20 


20 


20 


20 



Table 8 
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20 atoms: 


1 pt 


37 


40 


38 


37 


41 


41 


20 


42 


37 


39 


2pt: 


41 


39 


40 


39 


40 


41 


40 


40 


41 


42 


unit: 


43 


43 


37 


39 


41 


41 


39 


42 


41 


42 




47 


47 


45 


47 


47 


46 


46 


45 


47 


47 



It can be seen from tables 7 and 8 that the 
technique of the present invention provides the best 
solution to this problem both in terms of best value 
found and repeatability of results indicating the wider 
20 applicability of this technique. 



A further application of this technique is to the 
configuration of switched networks wherein information 
units comprise call routing information for controlling 
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the switching of network switches to route calls. An 
example of where usage would change in such a network is 
where a company decides to offer a special discount on 
calls in an under utilised part of the coirmiunications 
5 network at a time later that day, A system could adapt 
itself to allow efficient access of the appropriate data 
for the duration of the discount in advance and show that 
the system is optimised to meet the expected demand. In 
another example, in a mobile telephone network where a 

10 customer moves from one city to another, their data 
records can follow so that they are close by rather than 
being accessed slowly across large distances- This can 
take place by monitoring retrieval performance of the 
data for the customer and adapting the distribution of 

15 data to include the retrieval performance* 

Clearly the present invention has a wide range of 
applications and is particularly suited to optimization 
techniques with complex cost-values which have a complex 
search space e.g. more than one minimum or maximum. 

20 The application of the technique in genetic 

algorithms can be tailored to suit the problem by 
selecting the mutation rate and population size 
accordingly. 

As has been shown the present invention is 
25 particularly suited to optimizing a quality of service 
metric in a distributed database taking into account both 
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load on the servers and loads on the links interconnecting 
the servers and clients. 

Although the present invention has been described 
hereinabove with reference to allelic representations of the 
5 solution vector, the present invention is not limited to such 
representations and is applicable to canonical 
representations . 

The present invention can be implemented as a computer 
program operating on a standard computer and thus the present 
10 invention can be embodied as a storage medium containing 
processor implementable instructions for controlling a 
processor to carry out the method. Further, since the 
computer code can be transmitted in electronic form for 
example by being downloaded over a network, the present 
15 invention can be embodied as an electronic signal carrying 
computer code for controllir.g a computer to carry out the 
method . 

Although the present invention has been described 
hereinabove with reference to specific embodiments, the 
20 present invention is not limited to such embodiments and it 
would be apparent to a skilled person in the art that 
modifications can be made within the spirit and scope of the 
appended claims . 
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CLAIMS 

Data processing apparatus for determining optimum 
parameters of a model of a physical system, the apparatus 
comprising : 

(a) obtaining means for obtaining at least one 
initial string of values representing the 
parameters of the model to be optimised; 

(b) first determining means for determining a cost 
value associated wich the model having parameters 
represented by the or each string of values; 

(c) generating means for repeatedly generating a 
new string of values by selecting a sequence of 
values of random length, starting at a random 
position in a said string of values, replacing a 
sequence of values of the same length in a said 
string of values at a random position, and changing 
the value of one or more of the values of the 
resulting string of values to generate a new string 
of values; 

wherein said first determining means is adapted to 
determine a cost value associated with the model 
having parameters represented by the new string of 
values ; and 

(d) second determining means for determining the 

optimum parameters as one of said inicial or new 
string of values for which the cost value 
associated with the model having the optimum 
parameters is closest to a target. 

Data processing apparatus according to claim 1 wherein 
said generating means is adapted to consider a last value 
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in a said string of values as being sequential to a first 
value in said string of values such that said selected 
sequence of values can include said last and first values 
sequentially in a said string of values and the sequence 
of values to be replaced can include said last and first 
values sequentially in a said string of values. 

Data processing apparatus according to claim 1 or claim 
2, wherein said obtaining means is adapted to randomly 
obtain values for said at least one initial string of 
values . 

Data processing apparatus according to any preceding 
claim wherein said obtaining means is adapted to obtain a 
plurality of said initial strings of values to form a 
population, and said generating means is adapted to 
perform a genetic algorithm on said population by 
repeatedly generating a new string of values by 

(a) selecting a sequence of values of random 
length starting at a random position in a first 
said string of values, 

(b) replacing a sequence of values of the same 
length in a second said string of values at a 
random position, and 

(c) mutating the value of one or more of the 
values of the resulting string of values to 
generate said new string of values . 

Data processing apparatus according to claim 4, wherein 
said generating means is adapted to repeatedly 
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(a) delete a proportion of the population for 
which, the cost values are furthest from said 
target , and 

(b) perform said repeated generation of new 
strings of values until the population is 
regenerated; and 

(c) to select said first and second strings of 
values randomly from the undeleted proportion of 
the population for which the cost values are 
closest to said target . 

Data processing apparatus according to claim 4^ wherein 
said generating means is adapted to repeatedly 

(a) randomly select three said strings of values 

from said population, 
{b) select two of the selected strings of values 

for which the cost values are closest to th.e target 
as said first and second strings of values, and 
(c) replace the third of said selected strings of 

values wich the generated new string of values* 

Data processing apparatus according to any one of claims 
1 to 3, wherein said obtaining means is adapted to obtain 
a single said initial string of values. 

Data processing apparatus according to claim 7, wherein 
said generating means is adapted to repeatedly replace 
said single string of values with said new string of 
values if the cost value associated with the model having 
said new scring of parameters is closer to said target. 
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9. Data processing apparatus according to claim 8 wherein 
said generating means is adapted to repeatedly replace 
said parent string of values with said new string of 
values if the exponential of the difference in the cost 
5 values for said parent string of values and said new 

string of values divided by a factor dependent upon the 
number of repetitions by said generating means is greater 
than a random number between 0 and 1, 



TO 10. Data processing apparatus according to any one of claims 
1 to 9, wherein the parameters include configuration 
parameters of a physical system, the apparatus further . 
comprising control means for controlling the 
configuration of a physical system in accordance with 

15 optimum configuration parameters. 



11. Data processing apparatus according to claim 10 including 
mean.s for receiving operating parameters from said 
physical system such that the model of a physical system 
20 is adapted to model said physical system using the 

received operating parameters. 



12 . Data processing apparatus according to any one of claims 
1 to 9, further comprising control means for configuring 

25 a distributed database, the distributed database 

comprising a plurality of data servers connected over a 
network, each data server holding any number of data 
unir:s, and a plurality of clients connected over the 
network to the data servers, each of said clients being 

30 adapted to reprieve data units from and/or update data 

units at one or more of the daca servers, wherein the 
parameters of the model to be optimised include 
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configuration parameters defining which data server is to 
be accessed by which client, and the first determining 
means is arranged to utilise the configuration parameters 
and information on the passage of data within the 
distributed database to determine a cost value, said 
control means being arranged to control which data server 
is to be accessed by which client in accordance with 
determined optimum configuration parameters. 

13, Data processing apparatus according to claim 12, wherein 
said control means is adapted to copy, move and/or delete 
data units at data servers to change the distribution of 
the data units across the data servers and/or to change 
the data servers to be accessed by said clients using the 
determined optimum configuration parameters to improve 
the system performance. 

14 . Data processing apparatus according to claim 12 or claim 
13 wherein said first determining means is adapted to 
determine said cost value in dependence upon the time 
taken for one or more said clients to reprieve an.d/or 
update data units at one or more data servers . 

15. Data processing apparatus according to claim 14, wherein 
the first determining means is adapted to determine said 
cost value as any one of: the longest response time for 
any said client to access any said data server, or the 
longest response time for any said client to access any 
said data server and a proportion of the average of the 
response times for all said data servers, or as a 
function of the rates at which data can be retrieved by 
said clients from said data servers, the rates at which 
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data can be updated by said clients at said data servers, 
the contention between said clients for accessing said 
data seirvers, and communications times between said 
clients and said data seirvers. 

Data processing apparatus according to any one of claims 
12 to 15 including monitoring means for monitoring the 
passage of data over the network^ wherein said first 
determining means is adapted to use the output of said 
monitoring means to determine said cost values, and said 
control means is adapted to adaptively configure the 
distributed database. 

Data processing apparatus according to any one of claims 
1 to 9, further comprising control means for configuring 
a distributed processing system, the distributed 
processing system comprising a plurality of servers 
connected over a network, each server being capable of 
carrying out any number of processing operations, and a 
plurality of clients connected over the network to the 
servers, each of said clients being adapted to request 
processing to be carried out by one or more of the 
seirv^ers, wherein the parameters of the model to be 
optimised include configuration parameters defining which 
server is to be accessed by which client, and the first 
determining means is arranged to utilise the 
configuration parameters and infoirmation on the 
commxinications speed within the distributed processing 
system to determine a cost value, the control means being 
arranged to control which server is to be accessed by 
which client in accordance with determined optimum 
configuration parameters. 
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Data processing apparatus according to claim 17, wherein 
said first determining means is adapted to determine said 
cost value in dependence upon the transaction times by- 
said servers. 

Data processing apparatus according to claim 18 wherein 
said first determining means is adapted to determine said 
cost value as any one of: the longest transaction time by 
any said server, or the longest transaction time by any 
said data se2rver and a proportion of the average of the 
transaction times for all said servers. 

Data processing apparatus according to any one of claims 
17 to 19 including monitoring means for monitoring the 
transaction times of said servers, wherein said first 
determining means is adapted to use the output of said 
monitoring means to determine said cost values, and said 
control means is adapted to adaptively configure the 
distributed processing system, 

, A processor implemented method of determining optimum 
parameters of a model of a physical system, the method 
comprising : 

(a) obtaining at least one initial string of values 
representing the parameters of the model to be 
optimised; 

(b) determining a cost value associated with the model 
having parameters represented by the or each 
string of values; 

(c) repeatedly generating a new string of values by 
selecting a sequence of values of random length 



AMENDED SHEET 



67 

starting at a random position in a said storing of 
values, replacing a sequence of values of the same 
length in a said string of values at a random 
posit ion and changing the value of one or more of 
the values of the resulting string of values to 
generate a new string of values; 

(d) determining a cost value associated with the model 
having parameters represented by the new string of 
values; and 

(e) determining the optimum parameters as one of said 
initial or new string of values for which the cost 
value is closest to a target . 

A method according to claim 21, wherein in the generating 
step a last value in a said string of values is 
considered as being sequential to a first value in said 
string of values such that said selected sequence of 
values can include said last and first values 
sequentially in a said string of values and the sequence 
of values to be replaced can include said last and first 
values sequentially in a said string of values. 

A method according to claim 21 or claim 22, wherein the 
values for said at least one initial string of values is 
randomly obtained, 

A method according to any one of claims 21 to 23, wherein 
a plurality of said initial strings of values are 
obtained to form a population, and the repeated 
generating step comprises selecting a sequence of values 
of random length starting ac a random position in. a first 
said string of values, replacing a sequence of values of 
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the same length in a second said string of values, and 
changing the value of one or more of the values of the 
resulting string of values to generate said new string of 
values . 

5 

25. A method according to claim 24, including repeatedly: 

(a) deleting a proportion of the population for which 
the cost value is furthest from said target; 

(b) performing said repeated generating step until the 
10 population is regenerated; and 

(c) selecting said first and second strings of values 
randomly from the proportion of the population for 
which the cost values are closest to said target , 

26, A method according to claim 24 including repeatedly: 

(a) randomly selecting three said strings of values from 
said population 

(b) selecting two of the selected strings of values for 
which the cost values are closest to said target as said 
first and second strings of values for use in the 
generating step; and 

(c) replacing the third of said selected string of values 
with the generated new string of values. 

25 27. A method according to any one of claims 21 to 23, wherein 
a single said initial string of values is obtained as a 
parent string of values, and the repeated generating step 
comprises selecting a sequence of values or random length 
starting at a random position in said parent string of 

30 values, replacing a sequence of values of the same length 

in said parenc scring of values at a random position, and 
changing the value of one or more of the values of the 
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resulting string of values to generate said new string of 
values . 

, A method according to claim 27, including repeatedly- 
replacing said parent string of values with said new 
string of values if the cost value for said new string of 
values is closer to said target . 

. A method according to claim 28, including repeatedly 
replacing said parent string of values with said new 
string of values if the exponential of the difference in 
the cost values for said parent and new strings of values 
divided by as factor dependent upon the number of 
repetitions of the generating step is greater than a 
randomly generated number between 0 and 1. 

}. A method according to any one of claims 21 to 29, 

including applying the method to configure a physical 
system in which the parameters include configuration 
parameters of a physical system, the configuration 
including controlling a physical system in accordance 
with determined optimum configuration parameters. 

1. A method according to claim 30, including receiving 
operating parameters from said physical system, wherein 
the determining step includes using the received 
operating parameters to model said physical system. 

2. A method according to any one of claims 21 to 29, 
including applying the method to configure a distributed 
database comprising a plurality of data servers connected 
over a network, each data server holding any number of 
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data units, and a plurality of clients connected over 
said network to said data servers, each of said clients 
being adapted to retrieve data units from and/or update 
data units at one or more of the data servers, in which 
5 the parameters of the model to be optimised include 

configuration parameters defining which data server is to 
be accessed by which client, said configuration of the 
distributed database including controlling which data 
server is to be accessed 'by which client in accordance 
10 with determined optimum configuration parameters - 



33. A method according to claim 32, wherein configuring the 
distributed database comprises copying, moving and/or 
deleting data units at data servers to change the 
15 distribution of data units across the data servers, 

and/or changing the data servers to be accessed by said 
clients using the determined optimum configuration 
parameters to improve the system performance . 

20 34. A method according to either of claims 32 or 33, wherein 
said cost values are determined in dependence upon the 
time taken for one or more said 'Clients to retrieve 
and/or update data units at one or more data servers . 



25 35. A method according to claim 34, wherein the cost value is 
determined as any one of: the longest response time for 
any said client means to access any said daua sezrver, or 
the longest response time for any said data server and a 
proportion of the average of the response times for all 

30 said data servers, or as a function of the rates at which 

data can be retrieved by said clients from said data 
servers, the rates at which data can be updated by said 
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clients at said data servers, the contention between said 
clients for accessing said data servers, and 
communication times between said clients and said data 
servers . 

. A method according to any one of claims 32 to 35 
including the step of monitoring the passage of 
information over the network, wherein the cost value is 
determined in accordance with the monitoring, and the 
configuration of the distributed database is adaptively 
controlled. 

A method according to any one of claims 21 to 29, 
including applying the method to configure a distributed 
processing system comprising a plurality of servers 
connected over a network, each server being capable of 
carrying out any number of processing operations, and a 
plurality of clients connected over said network to said 
servers, each of said client being adapted to request 
processing to be carried out by one or more of the 
servers, in which the parameters of the model to be 
optimised include configuration parameters defining which 
server is to be accessed by which client, and the 
configuration including controlling which server is to be 
accessed by which client in accordance with determined 
optimum conf iguratiop. parameters , 

8. A method according to claim 37, wherein said cost value 
is determined in dependep.ce upon the transaction time 
taken for said servers. 
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A method according to either claims 3 7 or 38, including 
the step of monitoring the transaction times of said 
seirvers, wherein the cost value is determined in 
accordance with the monitoring, and the configuration of 
the distributed processing system is adaptively 
controlled. 

A storage medium storing processor implementable 
instructions for controlling a processor to carry out the 
method of any one of claims 21 to 39, 

An electronic signal carrying computer code for 
controlling a processor to carry out the method of any 
one of claims 21 to 39. 
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ABSTRACT 

DATA PROCESSING APPARATUS AND METHOD FOR 
OPTIMISING CONFIGURATION PARAMETERS 
OF A PHYSICAL SYSTEM 



A technique for determining optimum parameters of a model 
of a physical system is disclosed in which at least one 
initial string of values representing the parameters of 
the model to be optimised is obtained. A cost value 
associated with the model having parameters represented 
by the each string of values is determined. A new string 
of values is repeatedly generated by selecting a sequence 
of values of random length, starting at a random position 
in a string of values, replacing a sequence of values of 
the same length in a string of values at a random 
position, and changing the value of one or more of the 
values of the resulting string of values to generate a 
new string of values. The cost value associated with the 
model having parameters represented by the new string of 
values is determined and the optimum parameters are 
determined as one of the initial or new string of values 
for which the cost value associated with the model having 
the optimum parameters is closest to a target. 
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RULE 63 (37 C.F.R. 1.63) 
DECLARATION AND POWER OF ATTORNEY 
FOR PATENT APPLICATION 
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

As a below named inventor, I hereby declare that my residence, post office address and citizenship are as stated below next to my name, and I believe I am 
the original first and sole inventor (if only one name is listed below) or an original, first and joint inventor (if plural names are listed below) of the subject 
matter which is claimed and for which a patent is sought on the invention entitled: DATA PROCESSING APPARATUS AND METHOD FOR 
OPTIMISING CONFIGURATION PARAMETERS OF A PHYSICAL SYSTEM _ 
the specification of which (check applicable box(es)): 
[ ] is attached hereto. 

[ ] was filed on as U.S.Application Serial No, , 

[x ] was filed as PCT international application No. PCT/ GB99/02449 on 27 JULY 1999 

and (if applicable to U.S. or PCT application ) was amended on 

I hereby state that I have reviewed and understand the contents of the above identified specification, including the claims, as amended by any amendment 
referred to above. I acknowledge the duty to disclose information which is material to the examination of this application in accordance with 37 C.F.R. 
1.56. I hereby claim foreign priority benefits under 35 U.S.C. 1 19/365 of any foreign application(s) for patent or inventor's certificate listed below and have 
also identified below any foreign application for patent or inventor's certificate having a filing date before that of the application on which priority is 
claimed or, if no priority is claimed, before the filing date of this application: 

Prior Foreign Application(s): 

Application Number Country Day/Month/Year Filed 
9817051.7 ^ GREAT BRITAIN 5 AUGUST 1998 

I hereby claim the benefit under 35 U.S.C.Sl 19(e) of any United States provisional appli cation (s) listed below. 
Application Number Day/MonthAf ear Filed 



I hereby ci^m the benefit under 35 U.S.C. 120/365 of all prior United States and PCT international applications listed above or below and, insofar as the 
subject matter of each of the claims of this application is not disclosed in such prior applications in the manner provided by the first paragraph of 35 U.S.C. 
112, 1 acknowledge the duty to disclose material information as defined in 37 C.F.R. 1.56 which occurred between the filing date of the prior applications 
and the national or PCT international filing date of this application: 

Prior U.S./PCT Appli cation(s): Status: patented, 

Application Serial No. Day/Month/Year Filed pending, abandoned 

PCT/GB99/02449 27 JULY 1999 p^^^.^^ 

I hereby declare that all statements made herein of my own knowledge are true and that statements made on information and belief are believed to be true; 
and further that these statements were made with the knowledge that willful false statements and the like so made are punishable by fine or imprisonment, or 
h^th, under Section 1001 of Title 18 of the United States Code and that such willful false statements may jeopa rdize th e validity of the application or any 
r^ intent issued thereon. And I hereby appoint N^XQiv ^g ^TwrnrnTmr v r . linn Nnrth ft l rf ye-RiTgarSTlrflpmTT^^ telep hone 
\ / number (703) 816 -4000 (to whom all communications are to be directed), and the following attorneys thereof (of the same address) individually and 
t \/ collectively my attorneys to prosecute this application and to transact all business in the Patent and Trademark Office connected therewith and with the 
* resulting patent: Arthur R Crawford, 2522:7; Larry S. Nixon, 25640; Robert A. Vanderhye, ^^IQI^James T. Hosmer, 3(UMf4lobert W.Faris, p352^ 
Richard G Besha, 22770; Mark E. Nusbaum, 32 3.48: M ichael J. Keenan, 32^06-; Bryan H. Davidson, Stanley C.Spooner, 2^^a93r Leonard C. 

Mitchard, 29009; DuaneS5. Byers, 33363; Paul J. Henon, 33626; Jeffty H. Nelson, 304*1; John R. Lastova, 33149; H. Warren Bumam Jr., 3-95^; Thomas 
E. Byrne,' 32105'; Mary J. Wilson, 32 955; J . Scott Davidson, g3489; A lan M. Kagen, 361 J8; W illiam J. GriffithT lBSO; R obert A. Mo\m,^m^ 





Inventor's Signature: fi^ ^ i Date: 

Inventor: JMARTIN JOHN OATES ^ BRITIS 

(first) MI (last) (citizenship) 

Residence: (city) SUFFOLK (state/countiy) GREAT BRITAIN 

Post Office Address: 11 NAUGHTQN GARDENS. STOWMARKET. SUFFOLK 
(Zip code) IP14 2RA ' / 
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